Practical language modeling: an interpolating method

نویسندگان

  • Xiaohu Liu
  • Douglas D. O'Shaughnessy
چکیده

Language modeling is a key component in speech and handwriting recognition. N-gram language modeling is used as the formalism of choice for a wide range of domains. Although a high order N can reduce perplexity greatly, it is unrealistic in many practical cases to get statistically reliable N -grams. We propose an interpolated model by introducing signal words and clue words into the baseline N -gram model. The initial word in a word pair with high mutual information is chosen as a signal word. In the same way, we de ne such words that have high mutual information with a certain morphological form as clue words. In a given context, we select a signal word with the highest score to compute the probability of the current word, and a clue word with the highest score to estimate the probability of the form of the current word. We discuss the basic requirements of designing an interpolating language model and see how our models satisfy the requirements. We got considerable reduction in perplexity, compared to the baseline model. Because both signal words and clue words are easy to collect and handle, the proposed method is practical.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Method for Local Interpolation with Tension Trigonometric Spline Curves and Surfaces

In this work a family of tension trigonometric curves analogous to those of cubic Bézier curves is presented. Some properties of the proposed curves are discussed. We propose an efficient interpolating method based on the tension trigonometric splines with various properties, such as partition of unity, geometric invariance and convex hull property, etc. This new interpolating method is applied...

متن کامل

An Executive Approach Based On the Production of Fuzzy Ontology Using the Semantic Web Rule Language Method (SWRL)

Today, the need to deal with ambiguous information in semantic web languages is increasing. Ontology is an important part of the W3C standards for the semantic web, used to define a conceptual standard vocabulary for the exchange of data between systems, the provision of reusable databases, and the facilitation of collaboration across multiple systems. However, classical ontology is not enough ...

متن کامل

Latent Semantic Modeling and Smoothing of Chinese Language

Language modeling plays a critical role for automatic speech recognition. Typically, the n-gram language models suffer from the lack of a good representation of historical words and an inability to estimate unseen parameters due to insufficient training data. In this study, we explore the application of latent semantic information (LSI) to language modeling and parameter smoothing. Our approach...

متن کامل

Modeling and Analysis of Qualitative Systems Based on a New Fuzzy Inference Approach1

Qualitative modeling of technical processes may be accomplished by dynamic fuzzy systems. A new inference method with interpolating rules is proposed as an essential basis for the analysis of this class of systems. Using this approach, the system output is dependent on both an interpolating rule derived from the fuzzy input and the fuzzy input itself. A simple example shows the typical behavior...

متن کامل

On latent semantic language modeling and smoothing

Language modeling plays a critical role for automatic speech recognition. Conventionally, the n-gram language models suffer from lacking good representation of historical words and estimating unseen parameters from insufficient training data. In this work, the latent semantic information is explored for language modeling and parameter smoothing. In language modeling, we present a new representa...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2000